Insights Logo

Tom at 16

Week of March 3rd, 2003

Latest Update: Sunday March 9, 2003 07:20

Sunday March 9, 2003


Good morning. The time is 07:20 CST. The temperature is a balmy -36 degrees C. The wind is from the west at about 15 Km/H. Which makes the outside temperature this morning an estimated -45 C. Actually the 'C' or 'F' designation doesn't much matter today. For those who don't know, the two temperature scales meet at -40. Put another way, it's simply cold. And as Forest Gump would say, "That's all I have to say about that."

Last night was "date night" for Leah and I. We got a baby sitter, and went out for sushi and a movie. We don't eat out much simply because we tend to be disappointed when we do. Without trying to blow our own horn, we like to cook, and we're pretty good at it. The bottom line is we can make a meal just as nice, if not better, than we can get by going out. And it costs a whole lot less. Sushi is an exception. Making your own sushi is an extremely fiddly and time consuming process. We've done it in the past, but it's just not worth the effort. So when we go do go out for a meal, we often go to our favorite sushi restraunt downtown.

The movie was Tears of the Sun. Definitely a guy flick. I enjoyed it. Hollywood license aside, some of the cinematography was excellent. Leah was neutral -- parts were a tad too violent (and real) for her. Oh well. She gets to make the choice next time.

Today I've got some work to do on the company server in Indianapolis. I'm trying to make a very conscious effort to not only work realistic hours Monday to Friday, but avoid work altogether on the weekends. I'm going to make an exception to the rule in this case -- it makes sense to do what I need to do while the office is empty and I have lots of bandwidth on the DSL line.

[Top]

Friday March 7, 2003


TGIF

[tom@hydras]:/export/home/tom # uptime
  09:59PM   up 312 days,   4:02,  2 users,  load average: 0.15, 0.04, 0.02

The forecast tonight is for a nominal temp of -38C. With the wind chill factored in, the temp is expected to fall in the -45-47C range. Bleh is an understatement. I think I'll go throw another log on the fire...

[Top]

Thursday March 6, 2003


I just watched Bush's Presidential News Conference. Correct me if I'm wrong, but I do believe the President of the United States just cocked the trigger on a war with Iraq. The gun's been pointed for a long time now, but the barrel now contains a live round. We all have much to think about; much to reflect on. I personally do not think this pending war is going to be as surgical as many imply. Having said that, however, I admit that predicting the future is not one of my strong suits. As Aaron Brown would say, "I guess we'll just have to wait and see..."

A follow-on on my Kronk saga: Thanks to Svenson and David Thorarinsson who directed me to the fact the USB chipset on my board could be an old one -- one that has caused problems with Linux in the past. I plan to test the theory this weekend by disabling the on-board USB (via the BIOS) and doing some tests with the PCI USB 2.0 card supplied with the box. "I guess we'll just have to wait and see..."

I recognize I may have offended the sensibilities of some of my readers with my "sweeping" statements of Tuesday. Understand, I'm frustrated (I do NOT like hardware problems), I've experienced the worst form of pause I can think of on a computer (filesystem corruption), and the path to resolution is not clear cut. Such is life, I guess. So let me rephrase: Monday I experienced a problem with Kronk. It appears to be related to USB, and the kernel(s) I've been running. I've never experienced similar problems in two years of running similar configurations. As pointed out by some of my more astute readers, it might have nothing to do with AMD and everything to do with the on-board USB controller on my motherboard. I've arrested the damage by unplugging my USB mouse. Mmmm. "I guess we'll just have to wait and see..."

If anyone out there's had success with a 3rd party implementation (ie, not "official" KDE) of KDE 3.1 for Red Hat 8.0, please send me a URL. I've tried -- without success -- several different "hacks", none to my satisfaction. I don't want to have to got out and resolve library issues. I want the solution to provide me with all the RPMS I need, and I want to be able to inject said solution without fuss or muss. If you have direct experience with an RPM collection that's worked for you, I'm interested. So are several of my readers.

[Top]

Wednesday March 5, 2003


You won't get a lot out of me tonight, I'm afraid... I've got a Kaulb in my Knose and I've got a bad case of "general aches and pains". I get about one good cold a year. It appears this is the one.

I have temporarily resolved my filesystem corruption problems reported yesterday. I don't have a complete solution in hand, but I do appear to have a reliable place to store data and applications again.

The problem was the USB device I had plugged into my computer. Yes, yes, I know I said I didn't have any USB devices in use. I didn't think I did. But I overlooked my MS USB mouse -- subtle, but significant. I moved that to a regular PS/2 port, did a clean install, and all was well and good again.

It would appear that the Linux kernel has problems with USB. The Linux kernel I was using, that is (2.4.18 with the latest RH patches). I don't know if the problem's been fixed with newer kernels. I also don't know how having a USB mouse plugged into my computer caused me to experience data corruption on one of my partitions. Noteably during high I/O operations. I do know moving the mouse from USB to PS/2 rectified the problem. Who'da thunk.

I can also tell you I've run the same setup on another box (USB mouse, RH 8.0, and EXT3 filesystems) with no problems whatsoever. Which makes me think the Athlons in Kronk are playing a role as well. What role, I can't say. Testing, tweaking, research, playing with various kernels and kernel options... it all takes time, and I don't have a lot of that just right now. I have work to do. Work that demands I'm able to install a stock out-of-the-box distribution quickly, without a lot of fuss, and without a lot of patches, then try and mount an enterprise app on top of everything. Which means I don't have time to tweak custom kernels, alternate filesystems, etc. Such is life in the work of corporate journalism.

I'm going to take two asprin. Call me in the morning ;-)

[Top]

Tuesday March 4, 2003


Yesterday was not, to use an industry vernacular, a Good Day.

My developmental box, Kronk, was (<- foreboding word) set up with two versions of Red Hat. RH 8.1 (beta) lived on the front of the drive and comprised three partitions: a 100M /boot; a 1G swap; and a 20G root. On hda6 I had a single 20G partition where RH 8.0 lived. On hda5 I had a 20G /home partition which RH 8.1 automatically mounted; I manually mounted it when I needed it under RH 8.0. I kept all the installation files I needed under /home/admin -- that way I didn't have to fart around with CD's. With me so far? Good...

I use EXT3 filesystems on all my Linux boxes. I've been using EXT3 for over two years, and have never once had a problem. When a box locks hard, I simply power it off and on again. fsck does its thing on reboot, and everything is hunky and dory. Until yesterday.

When I sat down at my desk yesterday morning, Kronk was running RH 8.0. I was doing an emerge -u world on Phaedrus and I wanted to check email. I didn't have Mulberry installed under RH 8, but I did have it configured under RH 8.1 so I rebooted the box (actually, I could have just mounted hda5 over /home/tom under RH 8, but I didn't think of that at the time). RH 8.1 came up and immediately complained about some file system errors on hda5 during the mount process. Huh? fsck tried to fix the problem, but couldn't and dropped me single user mode. I tried to repair things manually, but after an hour I gave up. The sheer number of errors make the task impossible. Even if I had been able to right things, I wouldn't have trusted the data on the partition.

So WTF Huck?

I sat there for a good hour thinking about the problem before me. I wasn't too concerned about the data. It would be a pain to have to copy it all back again, but I did have the directories mirrored on another system. What really troubled me was how I ended up with such a badly corrupted filesystem. A bug in RH 8.1? Possible, I suppose. A bug in rsync (the app I used to move the data from the backup drive to hda5 in the first place)? But why did the corruption show up when it did and not before? A bug in the kernel? Again, possible, but I had updated the out-of-the-box kernel to the latest errata release... Differing versions of mke2fs between RH 8 and 8.1? I was stumped. As noted, I'd never had a corrupt EXT3 filesystem. I researched and Googled and came up dry -- nothing concrete. At least nothing that related directly to my particular HW/SW combination.

So I downloaded a utility from Seagate and check the hard drive. A-Okay.

Then I dug out my Acronis drive wiper diskette and wrote zeros to the whole hard disk.

Then I did a clean install of RH 8.0, which completed without error. I was in the process of rsyncing back the contents of /home/admin from my other drive when the system locked hard. When I rebooted, I had file corruption again. Not as bad as earlier in the day, but file corruption nonetheless.

So I wiped the drive again, and did ANOTHER clean install of RH 8.0. Again, the installation completed without error. This time I used scp to restore the contents of /home/admin. Half-way through the process the kernel blew up. The message was "kernel bug in usb-ohci.h". A Google search on the error turned up results on one of the kernel mailing lists that pointed back to a NEC USB 2.0 PCI card. I checked, and sure enough, that's the card I had in my machine. So I removed it, and did yet another clean install. Copied over a bunch of files from different directory (not /home/admin) on another machine, and the kernel blew up again. Same error message.

Pardon the language, but Fuck. I'm not using anything USB on the system. The error appears to have something to do with intensive I/O. And when the kernel blows up, it corrupts my EXT3 file system.

To make a very long story (and day) short, I ended up digging out my RH 7.3 discs and installing that. It stuck. I copied /home/admin from my notebook over to Kronk. The process completed without error. Everything's running without incident. I applied a few key bits of errata. No problem.

But there is a problem. I don't trust the box. I have no idea what caused the events of yesterday to transpire. The hard disk checks out. I've been using RH 8.0 and EXT3 on systems around here -- and elsewhere -- since it was in early beta, and haven't had a nit of trouble on any other machine. The only thing different about this box is the dual AMD processors. But I installed the latest Athlon errata kernels, and there's nothing in any of the stuff online I've read to indicate the error I got had anything to do with the AMD architecture -- the usb-ohci error has been documented on P2 boxes as well. But why does the error corrupt the filesystem? I don't understand. I do not like it when I don't understand something, especially when that something gets in the way of me doing my job.

I'm not amused.

And to sour my mood further still, it's forty-below outside. Everyone in our household is in agreement on one thing: We're really, really, really tired of the cold. Enough winter already. The only shining ray of hope I see on my horizon today is, "In like a lion, out like a lamb...".

[Top]

Send questions or comments about this site to webmaster@syroidmanor.com.
Copyright © 1998-2003 Tom Syroid. All Rights Reserved

Written in Valid XHTML CSS Logo